Overview

Dataset statistics

Number of variables20
Number of observations338592
Missing cells3039687
Missing cells (%)44.9%
Duplicate rows309470
Duplicate rows (%)91.4%
Total size in memory51.7 MiB
Average record size in memory160.0 B

Variable types

Numeric20

Warnings

Dataset has 309470 (91.4%) duplicate rows Duplicates
PayUmarenPay1 is highly correlated with PayWidePay1High correlation
PayUmarenNinki1 is highly correlated with PayWideNinki1High correlation
PayUmarenKumi2 is highly correlated with PayWideKumi2High correlation
PayUmarenPay2 is highly correlated with PayWidePay2High correlation
PayUmarenNinki2 is highly correlated with PayWideNinki2High correlation
PayWidePay1 is highly correlated with PayUmarenPay1High correlation
PayWideNinki1 is highly correlated with PayUmarenNinki1High correlation
PayWideKumi2 is highly correlated with PayUmarenKumi2High correlation
PayWidePay2 is highly correlated with PayUmarenPay2High correlation
PayWideNinki2 is highly correlated with PayUmarenNinki2High correlation
PayUmarenKumi2 has 337893 (99.8%) missing values Missing
PayUmarenPay2 has 337893 (99.8%) missing values Missing
PayUmarenNinki2 has 337893 (99.8%) missing values Missing
PayWideKumi4 has 337668 (99.7%) missing values Missing
PayWidePay4 has 337668 (99.7%) missing values Missing
PayWideNinki4 has 337668 (99.7%) missing values Missing
PayWideKumi5 has 337668 (99.7%) missing values Missing
PayWidePay5 has 337668 (99.7%) missing values Missing
PayWideNinki5 has 337668 (99.7%) missing values Missing

Reproduction

Analysis started2021-04-07 12:57:51.491931
Analysis finished2021-04-07 12:59:34.353794
Duration1 minute and 42.86 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

PayUmarenPay1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3244
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6210.123246
Minimum120
Maximum446550
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum120
5-th percentile390
Q1920
median2010
Q35180
95-th percentile24370
Maximum446550
Range446430
Interquartile range (IQR)4260

Descriptive statistics

Standard deviation15620.03699
Coefficient of variation (CV)2.515253944
Kurtosis141.3314332
Mean6210.123246
Median Absolute Deviation (MAD)1360
Skewness9.325312704
Sum2102698050
Variance243985555.5
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7001662
 
0.5%
5601576
 
0.5%
7201512
 
0.4%
5501475
 
0.4%
5701459
 
0.4%
5101439
 
0.4%
7701432
 
0.4%
6101421
 
0.4%
6601405
 
0.4%
4801386
 
0.4%
Other values (3234)323825
95.6%
ValueCountFrequency (%)
12015
 
< 0.1%
13029
 
< 0.1%
14087
< 0.1%
150112
< 0.1%
160131
< 0.1%
ValueCountFrequency (%)
44655010
< 0.1%
44237012
< 0.1%
36576012
< 0.1%
3622506
 
< 0.1%
32031015
< 0.1%

PayUmarenNinki1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct127
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.84270745
Minimum1
Maximum146
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median7
Q318
95-th percentile51
Maximum146
Range145
Interquartile range (IQR)15

Descriptive statistics

Standard deviation17.27642682
Coefficient of variation (CV)1.248052586
Kurtosis6.690521534
Mean13.84270745
Median Absolute Deviation (MAD)5
Skewness2.339511894
Sum4687030
Variance298.4749238
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
150054
 
14.8%
232460
 
9.6%
325672
 
7.6%
420421
 
6.0%
517639
 
5.2%
615826
 
4.7%
713275
 
3.9%
811180
 
3.3%
99809
 
2.9%
109028
 
2.7%
Other values (117)133228
39.3%
ValueCountFrequency (%)
150054
14.8%
232460
9.6%
325672
7.6%
420421
6.0%
517639
 
5.2%
ValueCountFrequency (%)
14617
< 0.1%
13918
< 0.1%
13717
< 0.1%
13610
< 0.1%
13116
< 0.1%

PayUmarenKumi2
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct41
Distinct (%)5.9%
Missing337893
Missing (%)99.8%
Infinite0
Infinite (%)0.0%
Mean686.8998569
Minimum104
Maximum1415
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum104
5-th percentile206
Q1415
median710
Q3911
95-th percentile1316
Maximum1415
Range1311
Interquartile range (IQR)496

Descriptive statistics

Standard deviation319.8225264
Coefficient of variation (CV)0.4656028433
Kurtosis-0.3410953229
Mean686.8998569
Median Absolute Deviation (MAD)202
Skewness0.2946987027
Sum480143
Variance102286.4484
MonotocityNot monotonic
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
81163
 
< 0.1%
71040
 
< 0.1%
60829
 
< 0.1%
91326
 
< 0.1%
41126
 
< 0.1%
131526
 
< 0.1%
80926
 
< 0.1%
31425
 
< 0.1%
51024
 
< 0.1%
61123
 
< 0.1%
Other values (31)391
 
0.1%
(Missing)337893
99.8%
ValueCountFrequency (%)
1048
< 0.1%
11314
< 0.1%
2036
< 0.1%
2068
< 0.1%
20812
< 0.1%
ValueCountFrequency (%)
141511
< 0.1%
131716
< 0.1%
131613
< 0.1%
131526
< 0.1%
111415
< 0.1%

PayUmarenPay2
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct51
Distinct (%)7.3%
Missing337893
Missing (%)99.8%
Infinite0
Infinite (%)0.0%
Mean2938.240343
Minimum160
Maximum28210
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum160
5-th percentile240
Q1480
median1150
Q33730
95-th percentile10800
Maximum28210
Range28050
Interquartile range (IQR)3250

Descriptive statistics

Standard deviation4748.838274
Coefficient of variation (CV)1.616218457
Kurtosis14.24674611
Mean2938.240343
Median Absolute Deviation (MAD)800
Skewness3.464512378
Sum2053830
Variance22551464.95
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40028
 
< 0.1%
170027
 
< 0.1%
24026
 
< 0.1%
35026
 
< 0.1%
80026
 
< 0.1%
48020
 
< 0.1%
103017
 
< 0.1%
1080016
 
< 0.1%
67016
 
< 0.1%
32015
 
< 0.1%
Other values (41)482
 
0.1%
(Missing)337893
99.8%
ValueCountFrequency (%)
1608
 
< 0.1%
1806
 
< 0.1%
20014
< 0.1%
24026
< 0.1%
32015
< 0.1%
ValueCountFrequency (%)
2821014
< 0.1%
1407015
< 0.1%
1080016
< 0.1%
988015
< 0.1%
936010
< 0.1%

PayUmarenNinki2
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct26
Distinct (%)3.7%
Missing337893
Missing (%)99.8%
Infinite0
Infinite (%)0.0%
Mean15.38054363
Minimum1
Maximum82
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q12.5
median7
Q322
95-th percentile54
Maximum82
Range81
Interquartile range (IQR)19.5

Descriptive statistics

Standard deviation18.84056068
Coefficient of variation (CV)1.224960647
Kurtosis3.491231674
Mean15.38054363
Median Absolute Deviation (MAD)6
Skewness1.926330545
Sum10751
Variance354.9667269
MonotocityNot monotonic
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
1122
 
< 0.1%
367
 
< 0.1%
253
 
< 0.1%
637
 
< 0.1%
2837
 
< 0.1%
531
 
< 0.1%
727
 
< 0.1%
926
 
< 0.1%
426
 
< 0.1%
1624
 
< 0.1%
Other values (16)249
 
0.1%
(Missing)337893
99.8%
ValueCountFrequency (%)
1122
< 0.1%
253
< 0.1%
367
< 0.1%
426
 
< 0.1%
531
 
< 0.1%
ValueCountFrequency (%)
8216
< 0.1%
7415
< 0.1%
5410
< 0.1%
5015
< 0.1%
4914
< 0.1%

PayWideKumi1
Real number (ℝ≥0)

Distinct153
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean549.9749994
Minimum102
Maximum1718
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum102
5-th percentile106
Q1214
median507
Q3811
95-th percentile1215
Maximum1718
Range1616
Interquartile range (IQR)597

Descriptive statistics

Standard deviation358.6424735
Coefficient of variation (CV)0.6521068666
Kurtosis-0.3568882159
Mean549.9749994
Median Absolute Deviation (MAD)296
Skewness0.6847124509
Sum186217135
Variance128624.4238
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2034381
 
1.3%
2044275
 
1.3%
4054220
 
1.2%
1024151
 
1.2%
4064082
 
1.2%
2064064
 
1.2%
3044045
 
1.2%
1054032
 
1.2%
1043990
 
1.2%
4073943
 
1.2%
Other values (143)297409
87.8%
ValueCountFrequency (%)
1024151
1.2%
1033864
1.1%
1043990
1.2%
1054032
1.2%
1063424
1.0%
ValueCountFrequency (%)
1718325
0.1%
1618246
0.1%
1617223
0.1%
1518368
0.1%
1517207
0.1%

PayWidePay1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1410
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1651.143264
Minimum110
Maximum124860
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum110
5-th percentile190
Q1380
median720
Q31590
95-th percentile5900
Maximum124860
Range124750
Interquartile range (IQR)1210

Descriptive statistics

Standard deviation3313.864679
Coefficient of variation (CV)2.007012203
Kurtosis143.5580337
Mean1651.143264
Median Absolute Deviation (MAD)430
Skewness8.727772953
Sum559063900
Variance10981699.11
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2804061
 
1.2%
3003936
 
1.2%
2903882
 
1.1%
2703828
 
1.1%
3203793
 
1.1%
3503764
 
1.1%
3303686
 
1.1%
2603630
 
1.1%
3703575
 
1.1%
2503510
 
1.0%
Other values (1400)300927
88.9%
ValueCountFrequency (%)
110486
 
0.1%
120567
 
0.2%
1301081
0.3%
1401931
0.6%
1502064
0.6%
ValueCountFrequency (%)
12486010
< 0.1%
7520012
< 0.1%
7163017
< 0.1%
7044012
< 0.1%
6924013
< 0.1%

PayWideNinki1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct128
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.94676484
Minimum1
Maximum145
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median7
Q318
95-th percentile51
Maximum145
Range144
Interquartile range (IQR)15

Descriptive statistics

Standard deviation17.35086984
Coefficient of variation (CV)1.244078469
Kurtosis6.573190506
Mean13.94676484
Median Absolute Deviation (MAD)5
Skewness2.323066423
Sum4722263
Variance301.0526843
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
148821
 
14.4%
232755
 
9.7%
325344
 
7.5%
420867
 
6.2%
517482
 
5.2%
615678
 
4.6%
712726
 
3.8%
812039
 
3.6%
910274
 
3.0%
108827
 
2.6%
Other values (118)133779
39.5%
ValueCountFrequency (%)
148821
14.4%
232755
9.7%
325344
7.5%
420867
6.2%
517482
 
5.2%
ValueCountFrequency (%)
14517
 
< 0.1%
14410
 
< 0.1%
13343
< 0.1%
13130
< 0.1%
13017
 
< 0.1%

PayWideKumi2
Real number (ℝ≥0)

HIGH CORRELATION

Distinct153
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean538.3800326
Minimum102
Maximum1718
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum102
5-th percentile106
Q1213
median506
Q3809
95-th percentile1215
Maximum1718
Range1616
Interquartile range (IQR)596

Descriptive statistics

Standard deviation354.9315621
Coefficient of variation (CV)0.6592584059
Kurtosis-0.2486483404
Mean538.3800326
Median Absolute Deviation (MAD)294
Skewness0.7344644814
Sum182291172
Variance125976.4137
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2034342
 
1.3%
1044287
 
1.3%
1024281
 
1.3%
1034227
 
1.2%
4064081
 
1.2%
3044062
 
1.2%
2073961
 
1.2%
6073939
 
1.2%
4053917
 
1.2%
1073915
 
1.2%
Other values (143)297580
87.9%
ValueCountFrequency (%)
1024281
1.3%
1034227
1.2%
1044287
1.3%
1053766
1.1%
1063889
1.1%
ValueCountFrequency (%)
1718227
0.1%
1618257
0.1%
1617353
0.1%
1518243
0.1%
1517302
0.1%

PayWidePay2
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1584
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2023.586145
Minimum110
Maximum129000
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum110
5-th percentile220
Q1460
median890
Q31960
95-th percentile7290
Maximum129000
Range128890
Interquartile range (IQR)1500

Descriptive statistics

Standard deviation3919.778376
Coefficient of variation (CV)1.93704547
Kurtosis119.0028255
Mean2023.586145
Median Absolute Deviation (MAD)540
Skewness7.998962791
Sum685170080
Variance15364662.52
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3203271
 
1.0%
4103228
 
1.0%
3903196
 
0.9%
4003172
 
0.9%
2803154
 
0.9%
2703141
 
0.9%
3603114
 
0.9%
3803096
 
0.9%
3703067
 
0.9%
2902997
 
0.9%
Other values (1574)307156
90.7%
ValueCountFrequency (%)
110148
 
< 0.1%
120385
 
0.1%
130543
0.2%
140890
0.3%
1501187
0.4%
ValueCountFrequency (%)
12900012
< 0.1%
9656011
< 0.1%
8538012
< 0.1%
8481011
< 0.1%
8152014
< 0.1%

PayWideNinki2
Real number (ℝ≥0)

HIGH CORRELATION

Distinct131
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16.6721157
Minimum1
Maximum148
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median10
Q322
95-th percentile57
Maximum148
Range147
Interquartile range (IQR)18

Descriptive statistics

Standard deviation18.87688989
Coefficient of variation (CV)1.132243215
Kurtosis5.167001671
Mean16.6721157
Median Absolute Deviation (MAD)7
Skewness2.089936533
Sum5645045
Variance356.3369718
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
130802
 
9.1%
225245
 
7.5%
323113
 
6.8%
419631
 
5.8%
516773
 
5.0%
614683
 
4.3%
713235
 
3.9%
811927
 
3.5%
911178
 
3.3%
109694
 
2.9%
Other values (121)162311
47.9%
ValueCountFrequency (%)
130802
9.1%
225245
7.5%
323113
6.8%
419631
5.8%
516773
5.0%
ValueCountFrequency (%)
14816
< 0.1%
14017
< 0.1%
13731
< 0.1%
13618
< 0.1%
13416
< 0.1%

PayWideKumi3
Real number (ℝ≥0)

Distinct153
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean539.3373057
Minimum102
Maximum1718
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum102
5-th percentile106
Q1213
median506
Q3809
95-th percentile1215
Maximum1718
Range1616
Interquartile range (IQR)596

Descriptive statistics

Standard deviation353.4615192
Coefficient of variation (CV)0.6553626376
Kurtosis-0.246561507
Mean539.3373057
Median Absolute Deviation (MAD)293
Skewness0.731560387
Sum182615297
Variance124935.0456
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1034299
 
1.3%
6084242
 
1.3%
2044193
 
1.2%
2064108
 
1.2%
3054087
 
1.2%
1044009
 
1.2%
5063962
 
1.2%
3043957
 
1.2%
2033952
 
1.2%
7083928
 
1.2%
Other values (143)297855
88.0%
ValueCountFrequency (%)
1023515
1.0%
1034299
1.3%
1044009
1.2%
1053854
1.1%
1063803
1.1%
ValueCountFrequency (%)
1718299
0.1%
1618356
0.1%
1617251
0.1%
1518303
0.1%
1517248
0.1%

PayWidePay3
Real number (ℝ≥0)

Distinct1878
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2616.699804
Minimum110
Maximum117910
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum110
5-th percentile250
Q1550
median1110
Q32560
95-th percentile9800
Maximum117910
Range117800
Interquartile range (IQR)2010

Descriptive statistics

Standard deviation5079.497338
Coefficient of variation (CV)1.941184591
Kurtosis97.31017237
Mean2616.699804
Median Absolute Deviation (MAD)700
Skewness7.547367882
Sum885993620
Variance25801293.21
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3702654
 
0.8%
4102645
 
0.8%
4002641
 
0.8%
3802628
 
0.8%
3902610
 
0.8%
3502598
 
0.8%
3402577
 
0.8%
4202569
 
0.8%
4302437
 
0.7%
3102378
 
0.7%
Other values (1868)312855
92.4%
ValueCountFrequency (%)
11056
 
< 0.1%
12087
 
< 0.1%
130267
0.1%
140552
0.2%
150581
0.2%
ValueCountFrequency (%)
11791016
< 0.1%
10910016
< 0.1%
10636010
< 0.1%
10599015
< 0.1%
10511014
< 0.1%

PayWideNinki3
Real number (ℝ≥0)

Distinct139
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.26387511
Minimum1
Maximum147
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q15
median13
Q328
95-th percentile65
Maximum147
Range146
Interquartile range (IQR)23

Descriptive statistics

Standard deviation21.03656246
Coefficient of variation (CV)1.038131273
Kurtosis3.774789036
Mean20.26387511
Median Absolute Deviation (MAD)9
Skewness1.823504803
Sum6861186
Variance442.53696
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
120348
 
6.0%
218474
 
5.5%
317725
 
5.2%
416211
 
4.8%
515604
 
4.6%
613507
 
4.0%
713165
 
3.9%
811456
 
3.4%
911194
 
3.3%
109945
 
2.9%
Other values (129)190963
56.4%
ValueCountFrequency (%)
120348
6.0%
218474
5.5%
317725
5.2%
416211
4.8%
515604
4.6%
ValueCountFrequency (%)
14710
 
< 0.1%
14617
< 0.1%
1459
 
< 0.1%
14314
< 0.1%
14231
< 0.1%

PayWideKumi4
Real number (ℝ≥0)

MISSING

Distinct56
Distinct (%)6.1%
Missing337668
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean444.2835498
Minimum102
Maximum1314
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum102
5-th percentile104
Q1203
median312
Q3710
95-th percentile1012
Maximum1314
Range1212
Interquartile range (IQR)507

Descriptive statistics

Standard deviation324.1941116
Coefficient of variation (CV)0.7297009123
Kurtosis-0.4754475071
Mean444.2835498
Median Absolute Deviation (MAD)200
Skewness0.7684466034
Sum410518
Variance105101.822
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20633
 
< 0.1%
51032
 
< 0.1%
40532
 
< 0.1%
20530
 
< 0.1%
10329
 
< 0.1%
71126
 
< 0.1%
91124
 
< 0.1%
20324
 
< 0.1%
70924
 
< 0.1%
80924
 
< 0.1%
Other values (46)646
 
0.2%
(Missing)337668
99.7%
ValueCountFrequency (%)
1026
 
< 0.1%
10329
< 0.1%
10417
< 0.1%
10515
< 0.1%
1068
 
< 0.1%
ValueCountFrequency (%)
131416
< 0.1%
121315
< 0.1%
101513
< 0.1%
101213
< 0.1%
91511
< 0.1%

PayWidePay4
Real number (ℝ≥0)

MISSING

Distinct62
Distinct (%)6.7%
Missing337668
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean1434.458874
Minimum130
Maximum15710
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum130
5-th percentile240
Q1480
median770
Q31450
95-th percentile5080
Maximum15710
Range15580
Interquartile range (IQR)970

Descriptive statistics

Standard deviation2252.913469
Coefficient of variation (CV)1.57056679
Kurtosis25.74087869
Mean1434.458874
Median Absolute Deviation (MAD)350
Skewness4.702483819
Sum1325440
Variance5075619.1
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
109039
 
< 0.1%
71032
 
< 0.1%
51028
 
< 0.1%
24028
 
< 0.1%
48027
 
< 0.1%
26027
 
< 0.1%
45026
 
< 0.1%
49026
 
< 0.1%
53025
 
< 0.1%
57021
 
< 0.1%
Other values (52)645
 
0.2%
(Missing)337668
99.7%
ValueCountFrequency (%)
1306
 
< 0.1%
1705
 
< 0.1%
19018
< 0.1%
24028
< 0.1%
25011
 
< 0.1%
ValueCountFrequency (%)
1571016
< 0.1%
607012
< 0.1%
589013
< 0.1%
508017
< 0.1%
38309
< 0.1%

PayWideNinki4
Real number (ℝ≥0)

MISSING

Distinct36
Distinct (%)3.9%
Missing337668
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean19.93506494
Minimum1
Maximum87
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile2.15
Q18
median14
Q326
95-th percentile57
Maximum87
Range86
Interquartile range (IQR)18

Descriptive statistics

Standard deviation17.51644032
Coefficient of variation (CV)0.8786748563
Kurtosis2.683227212
Mean19.93506494
Median Absolute Deviation (MAD)7
Skewness1.64183593
Sum18420
Variance306.8256814
MonotocityNot monotonic
Histogram with fixed size bins (bins=36)
ValueCountFrequency (%)
1477
 
< 0.1%
758
 
< 0.1%
958
 
< 0.1%
1758
 
< 0.1%
855
 
< 0.1%
1053
 
< 0.1%
343
 
< 0.1%
1836
 
< 0.1%
634
 
< 0.1%
4929
 
< 0.1%
Other values (26)423
 
0.1%
(Missing)337668
99.7%
ValueCountFrequency (%)
128
< 0.1%
219
< 0.1%
343
< 0.1%
411
 
< 0.1%
58
 
< 0.1%
ValueCountFrequency (%)
8716
< 0.1%
6213
< 0.1%
6112
< 0.1%
5717
< 0.1%
5015
< 0.1%

PayWideKumi5
Real number (ℝ≥0)

MISSING

Distinct61
Distinct (%)6.6%
Missing337668
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean687.025974
Minimum105
Maximum1516
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum105
5-th percentile116
Q1309
median709
Q31013
95-th percentile1416
Maximum1516
Range1411
Interquartile range (IQR)704

Descriptive statistics

Standard deviation403.9949977
Coefficient of variation (CV)0.5880345328
Kurtosis-0.9347520319
Mean687.025974
Median Absolute Deviation (MAD)304
Skewness0.3636459561
Sum634812
Variance163211.9582
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
141631
 
< 0.1%
141531
 
< 0.1%
51429
 
< 0.1%
101329
 
< 0.1%
21428
 
< 0.1%
71127
 
< 0.1%
30926
 
< 0.1%
91525
 
< 0.1%
40823
 
< 0.1%
30722
 
< 0.1%
Other values (51)653
 
0.2%
(Missing)337668
99.7%
ValueCountFrequency (%)
1056
 
< 0.1%
10715
< 0.1%
10910
< 0.1%
1129
< 0.1%
11616
< 0.1%
ValueCountFrequency (%)
151611
 
< 0.1%
141817
< 0.1%
141631
< 0.1%
141531
< 0.1%
131515
< 0.1%

PayWidePay5
Real number (ℝ≥0)

MISSING

Distinct69
Distinct (%)7.5%
Missing337668
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean1874.87013
Minimum150
Maximum10360
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum150
5-th percentile220
Q1530
median1050
Q32010
95-th percentile7500
Maximum10360
Range10210
Interquartile range (IQR)1480

Descriptive statistics

Standard deviation2296.960421
Coefficient of variation (CV)1.225130416
Kurtosis4.888568443
Mean1874.87013
Median Absolute Deviation (MAD)650
Skewness2.289966801
Sum1732380
Variance5276027.177
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
29029
 
< 0.1%
38029
 
< 0.1%
22023
 
< 0.1%
105021
 
< 0.1%
239019
 
< 0.1%
73019
 
< 0.1%
649018
 
< 0.1%
147018
 
< 0.1%
53018
 
< 0.1%
231017
 
< 0.1%
Other values (59)713
 
0.2%
(Missing)337668
99.7%
ValueCountFrequency (%)
1504
 
< 0.1%
1605
 
< 0.1%
1707
< 0.1%
18016
< 0.1%
20013
< 0.1%
ValueCountFrequency (%)
1036015
< 0.1%
1027016
< 0.1%
809011
< 0.1%
750015
< 0.1%
649018
< 0.1%

PayWideNinki5
Real number (ℝ≥0)

MISSING

Distinct43
Distinct (%)4.7%
Missing337668
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean26.48917749
Minimum1
Maximum90
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile2
Q18
median22
Q337
95-th percentile71
Maximum90
Range89
Interquartile range (IQR)29

Descriptive statistics

Standard deviation22.00686615
Coefficient of variation (CV)0.8307870701
Kurtosis0.136074914
Mean26.48917749
Median Absolute Deviation (MAD)14
Skewness0.9901631021
Sum24476
Variance484.3021579
MonotocityNot monotonic
Histogram with fixed size bins (bins=43)
ValueCountFrequency (%)
1582
 
< 0.1%
449
 
< 0.1%
2249
 
< 0.1%
749
 
< 0.1%
1342
 
< 0.1%
142
 
< 0.1%
1438
 
< 0.1%
230
 
< 0.1%
530
 
< 0.1%
2427
 
< 0.1%
Other values (33)486
 
0.1%
(Missing)337668
99.7%
ValueCountFrequency (%)
142
< 0.1%
230
< 0.1%
325
< 0.1%
449
< 0.1%
530
< 0.1%
ValueCountFrequency (%)
9015
< 0.1%
7615
< 0.1%
7216
< 0.1%
7116
< 0.1%
6823
< 0.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

PayUmarenPay1PayUmarenNinki1PayUmarenKumi2PayUmarenPay2PayUmarenNinki2PayWideKumi1PayWidePay1PayWideNinki1PayWideKumi2PayWidePay2PayWideNinki2PayWideKumi3PayWidePay3PayWideNinki3PayWideKumi4PayWidePay4PayWideNinki4PayWideKumi5PayWidePay5PayWideNinki5
014904nannannan15166004516850105154202nannannannannannan
18002nannannan809320210824011094706nannannannannannan
24201nannannan4082101405230250866010nannannannannannan
33042078nannannan15168310886151150761617790131nannannannannannan
414904nannannan15166004516850105154202nannannannannannan
5710028nannannan91215401791626011216153016nannannannannannan
6838027nannannan108197024112333035812496041nannannannannannan
74001nannannan6162201716311028607551055nannannannannannan
83628066nannannan141672606212141544085121629450107nannannannannannan
96501nannannan814330111143904811109011nannannannannannan

Last rows

PayUmarenPay1PayUmarenNinki1PayUmarenKumi2PayUmarenPay2PayUmarenNinki2PayWideKumi1PayWidePay1PayWideNinki1PayWideKumi2PayWidePay2PayWideNinki2PayWideKumi3PayWidePay3PayWideNinki3PayWideKumi4PayWidePay4PayWideNinki4PayWideKumi5PayWidePay5PayWideNinki5
3385822401nannannan20514012086207508110010nannannannannannan
3385832401nannannan20514012086207508110010nannannannannannan
338584657022nannannan21320902510210409113114012nannannannannannan
3385851773043nannannan815567048608183019615354036nannannannannannan
3385868701nannannan20449024121040142127409nannannannannannan
3385879004nannannan8104105110273025108316027nannannannannannan
33858816407nannannan610480421038022064303nannannannannannan
3385896301nannannan714270160749066145808nannannannannannan
338590625020nannannan509170017405334034409791060nannannannannannan
33859111603nannannan910570510147050569141072069nannannannannannan